Improving English-Bulgarian Statistical Machine Translation by Phrasal Verb Treatment

نویسندگان

  • Iliana Simova
  • Valia Kordoni
چکیده

This work describes an experimental evaluation of the significance of phrasal verb treatment for obtaining better quality statistical machine translation (SMT) results. Phrasal verbs are multiword expressions used frequently in English, independent of the domain and degree of formality of language. They are challenging for natural language processing due to their idiosyncratic semantic and syntactic properties. The meaning of phrasal verbs is often not directly derivable from the semantics of their constituent tokens. In addition, they are hard to identify in text because of their flexible structure and due to ambiguous prepositional phrase attachments. The importance of the detection and special treatment of phrasal verbs is measured in the context of SMT, where the wordfor-word translation of these units often produces incoherent results. Two ways of integrating phrasal verb information in a phrase-based SMT system are presented. Automatic and manual evaluations of the results reveal improvements in the translation quality in both experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

استفاده از تجزیه گرهای احتمالاتی زبان طبیعی جهت بهبود ترجمه افعال گروهی انگلیسی به فارسی

Machine translation of English sentences faces a big problem when it deals with phrasal verbs. Phrasal verb is a common structure occurring in English as a combination of a verb and a preposition, a verb and an adverb, or a verb with both an adverb and a preposition. Meaning of a phrasal verb is not compositional. The second part of the phrasal verbs which often is a preposition is called parti...

متن کامل

Using Word Embeddings for Improving Statistical Machine Translation of Phrasal Verbs

We examine the employment of word embeddings for machine translation (MT) of phrasal verbs (PVs), a linguistic phenomenon with challenging semantics. Using word embeddings, we augment the translation model with two features: one modelling distributional semantic properties of the source and target phrase and another modelling the degree of compositionality of PVs. We also obtain paraphrases to ...

متن کامل

Better Statistical Machine Translation through Linguistic Treatment of Phrasal Verbs

This article describes a linguistically informed method for integrating phrasal verbs into statistical machine translation (SMT) systems. In a case study involving English to Bulgarian SMT, we show that our method does not only improve translation quality but also outperforms similar methods previously applied to the same task. We attribute this to the fact that, in contrast to previous work on...

متن کامل

Multiword Expressions in Machine Translation

This work describes an experimental evaluation of the significance of phrasal verb treatment for obtaining better quality statistical machine translation (SMT) results. The importance of the detection and special treatment of phrasal verbs is measured in the context of SMT, where the word-for-word translation of these units often produces incoherent results. Two ways of integrating phrasal verb...

متن کامل

Improving statistical machine translation by classifying and generalizing inflected verb forms

This paper introduces a rule-based classification of single-word and compound verbs into a statistical machine translation approach. By substituting verb forms by the lemma of their head verb, the data sparseness problem caused by highly-inflected languages can be successfully addressed. On the other hand, the information of seen verb forms can be used to generate new translations for unseen ve...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013